Robust spoken dialogue systems for consumer products: a concrete application

نویسندگان

Xavier Pouteau

Luis Arévalo

چکیده

In this paper, we report the significant results of a fully-implemented voice operated dialogue system, and particularly its main component: the Dialogue Manager (DM). Just like for other interfaces, spoken interfaces require a well-conducted design, implying a good analysis of the users’ needs throughout the dialogue. The VODIS project1 has led to the design and development of a spoken interface for the control of car equipment. Due to the workload caused by the task of driving the vehicle, spoken communication provides a potentially safe and efficient mode of operating the car equipment. To achieve this, we present the main characteristics of the task model specified during the design stage, and show how its specific features related to the spoken communication allowed to implement a robust dialogue. 1. GENERAL REQUIREMENTS To deal with the specificities of a spoken dialogue in the car, the activities within the project have been focused on four main tasks: a well-performing automatic speech recognition unit, the design of the interface, the integration of all system modules, and the task model, that serves as a backbone to model all relevant components of the interaction in the integrated system. Out of those, we will especially describe the two last ones, closely related to the content of the Dialogue Manager [1]. But at first, we provide here a summary of the outcomes of the two first significant activities, taking into account the specificities of the in-car environment [2]. 1.1. Signal processing for a well performing ASR The vocal interface has been designed to operate robustly in an acoustically adverse environment, i.e., it has been taken into account that the speech caught by the far-talk microphone (mounter on the ceiling of the vehicle) is potentially corrupted by other speech or music signals stemming from the car-audio system and ambient noise due to the tires, wind, other vehicles, etc. These distortions are tackled by two different signal-processing modules: the former by an acoustic echo canceller, the latter by a noise reduction scheme. The acoustic echo canceller operates on the signal coming directly from the far-talk microphone and the audio signal played by the loudspeakers. The electro-acoustic path loudspeaker-room-microphone is modelled by a 450 tap FIR-filter with coefficients adapted by an NLMS-algorithm enhanced by special measures to ensure fast convergence and stability in a noisy environment [3]. Thus robust voice operation is feasible 1. For more information, please refer to http://werner.ira.uka.de/VODIS, and to [6] even if the driver is listening to an audio source of considerable volume-level. This module has been implemented on a DSPboard plugged into the PC that hosts the entire vocal interface. The noise reduction module has been merged with the feature extractor of the speech recogniser, which is fed with the signal coming from the echo canceller. A spectral substraction scheme as discussed in [4] is applied to the MEL spectral coefficients. The noise power in each frequency band is estimated following the principle of the minimum statistics, i.e. observing the minima in a smoothed version of the power spectral density. 1.2. Interface design The design of the interface has primarily been influenced by the choice of speech as the main mode of human-machine interaction [5]. This choice is based on two motivations: • Since the driving task puts heavy demands on the users’ gestural channel (their hands), using speech to operate the system does not require additional use of that channel. • Though the number of functionalities available on the interface increases with the sophistication of car equipment components (the complete system used in VODIS offers more than 80 functionalities), the space available on the dashboard is generally very limited. So opting for a tactile interface would enforce to design a dialogue implying a large number of sub-menus, as well as a high constraint on the driver’s visual channel. To limit the constraints on the user’s visual channel, another important point is the amount of information to provide to the user, and the form to convey it. In the remainder of the paper, we will refer to conveyed information as feedback, although the definition of feedback itself is more specific than the global notion of providing information. As always, the dilemma between a spoken feedback, a visual feedback or a combined one had to be solved. On the one hand, spoken messages, via text-to-tpeech synthesis (TTS), are actually perceived by users without distracting their visual attention. But on the other hand, TTS messages are transient, so feedback is only accessible at the time the message is spoken. Furthermore, spoken messages are undoubtedly intrusive, while visual feedback is only accessed depending on users’ decision (when they actually look at the screen). To provide the user with the “right” feedback, every dialogue situation has been carefully studied, so as to define: 5th International Conference on Spoken Language Processing (ICSLP 98) Sydney, Australia November 30 -December 4, 1998 ISCA Archive http://www.isca-speech.org/archive

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust and adaptive architecture for multilingual spoken dialogue systems

We present how robustness and adaptivity can be supported by the spoken dialogue system architecture. AthosMail is a multilingual spoken dialogue system for e-mail domain. It is being developed in the EU-funded DUMAS project. It has flexible system architecture supporting multiple components for input interpretation, dialogue management and output generation. In addition to language differences...

متن کامل

Toward Spoken Dialogue as Mutual Agreement

This paper re-envisions human-machine dialogue as a set of mutual agreements between a person and a computer. The intention is to provide the person with a habitable experience that accomplishes her goals, and to provide the computer with sufficient flexibility and intuition to support them. The application domain is particularly challenging: for its vocabulary size, for the number and variety ...

متن کامل

Error recovery for robust language understanding in spoken dialogue systems

In this paper, we proposed an example-based approach aiming at recovering ill-formed inputs to improve robustness of spoken dialogue systems. In this approach, a treebank, which contains example sentences and their correct parse trees, is used to provide clues for fixing the errors of ill-formed inputs. Particularly, the proposed error recovery method is suitable for spoken dialogue application...

متن کامل

Robust and efficient semantic parsing of free word order languages in spoken dialogue systems

This paper presents a semantic parser for spoken dialogue systems. The parser is designed especially for the analysis of free word order languages by providing a feature called orderindependent matching. We describe how this feature allows writing of rules for free word order languages in an elegant way (using German as example language) and how it increases the robustness against speech recogn...

متن کامل

Studies on Robust Language and Dialogue Processing for Spoken Dialogue Systems

In spoken dialogue systems, robust language processing for spontaneous speech understanding and robust dialogue processing for achieving user goal are inevitable. Previously, research of speech recognition and research of natural language understanding were done independently. At first glance, it seems to be no problem to combine these two technologies, because the purpose of speech recognition...

متن کامل

Designing a Portable Spoken Dialogue System

Spoken dialogue systems enable the construction of complex applications involving extended, meaningful interactions with users. Building an eeective, generic dialogue system requires techniques and expertise from a number of areas such as natural language, computer-human interaction, and information systems. A key challenge is to design a system through which user-friendly applications can be c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Robust spoken dialogue systems for consumer products: a concrete application

نویسندگان

چکیده

منابع مشابه

Robust and adaptive architecture for multilingual spoken dialogue systems

Toward Spoken Dialogue as Mutual Agreement

Error recovery for robust language understanding in spoken dialogue systems

Robust and efficient semantic parsing of free word order languages in spoken dialogue systems

Studies on Robust Language and Dialogue Processing for Spoken Dialogue Systems

Designing a Portable Spoken Dialogue System

عنوان ژورنال:

اشتراک گذاری